The WebML dataset consists of twelve WebML projects coming from real world WebML applications.
In Figure~\ref{fig:webml_dataset_project_example} you can see an example a fragment of a WebML project showing an area called \emph{Shops} containing three pages and some units.
\begin{figure}[ht]
  \begin{center}
	\includegraphics[width=0.8\textwidth]{./pictures/webml_dataset_project_example}
	\caption{Example of an area of a project from the WebML dataset.}
	\label{fig:webml_dataset_project_example}
  \end{center}
\end{figure}
The code below shows the XML representation of the area:
\begin{lstlisting}
<?xml version="1.0" encoding="UTF-8"?>

<Area id="sv3g#area5g" name="Shops" defaultPage="sv3g#area5g#page20g" landmark="true">  
  <OperationUnits> 
    <CreateUnit id="sv3g#area5g#cru11g" name="Save DC" entity="ent2"> 
      <OKLink id="sv3g#area5g#cru11g#oln48g" name="Link OK 44" to="sv3g#area5g#page20g" automaticCoupling="true"/> 
    </CreateUnit>  
    <NoOpOperationUnit id="sv3g#area5g#opu13g"name="nop"> 
      <OKLink id="sv3g#area5g#opu13g#oln49g" name="Link OK 45" to="sv3g#area5g#page20g" automaticCoupling="true"/> 
    </NoOpOperationUnit>  
    ...
  </OperationUnits>
  <Page id="sv3g#area5g#page22g" name="Modify Shop">  
    <ContentUnits> 
      <DataUnit id="sv3g#area5g#page22g#dau7g" name="Modify Shop" entity="ent2"> 
	<Link id="sv3g#area5g#page22g#dau7g#ln107g" name="Link 99" to="sv3g#area5g#page22g#enu22g" type="transport" automaticCoupling="false" validate="true"> 
	</Link>  
	<Link id="sv3g#area5g#page22g#dau7g#ln31g" name="Link 31" to="sv3g#area5g#mfu4g" type="transport" automaticCoupling="true" validate="true"/> 
      </DataUnit>  
      <EntryUnit id="sv3g#area5g#page22g#enu22g" linkOrder="sv3g#area5g#page22g#enu22g#ln108g sv3g#area5g#page22g#enu22g#ln109g" name="Modify Store"> 
	  <Field id="sv3g#area5g#page22g#enu22g#fld45g" name="Name" type="string" modifiable="true" preloaded="true"> 
	</Field>
	...
      </EntryUnit> 
    </ContentUnits>
    <layout:Grid>
       ...
    </layout:Grid>
  </Page>
  <layout:Grid>
    ...
  </layout:Grid>
</Area>
\end{lstlisting}

\newpage

The quantity of concepts expressed by the WebML models is higher (whole projects, siteviews, areas, pages, content units, operation units, links) with respect to UML models (whole projects, packages, classes and attributes). This results in a more complex XML representation for WebML projects, though the information remains well structured.
On average the projects are very big: they contain big areas with hundreds of pages and units. Only few projects are small. Because of the size and because of concurrent working issues while editing this models in a real industrial environment, the XML files representing a whole project are splitted up according to the metamodel element (i.e. there are separate XML files for site views, areas, pages, etc.).

Figure~\ref{fig:webml_dataset_terms_distribution} depicts the frequency distribution of terms of the WebML dataset (10 projects). We show the distribution up to the first two hundred terms.
\begin{figure}[htbp]
  \begin{center}
	\includegraphics[width=0.5\textwidth]{./pictures/WebML_dataset_term_distribution.eps}
	\caption{The frequency distribution of terms of the WebML dataset.}
	\label{fig:webml_dataset_terms_distribution}
  \end{center}
\end{figure}

As shown for the UML dataset (Section \ref{uml-dataset}), also the distribution of terms of the WebML dataset approximates a power-law function and, therefore, it follows the Zipf's law. However, in this case, the shape of the curve is less pronounced. This fact is due to the amount of projects, and therefore terms, that are present in the two dataset: the UML dataset contains much more models than the WebML dataset.